75 research outputs found

    Accelerating Training of Deep Neural Networks via Sparse Edge Processing

    Full text link
    We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements. This novel architecture introduces the notion of edge-processing to provide flexibility and combines junction pipelining and operational parallelization to speed up training. The overall effect is to reduce network complexity by factors up to 30x and training time by up to 35x relative to GPUs, while maintaining high fidelity of inference results. This has the potential to enable extensive parameter searches and development of the largely unexplored theoretical foundation of DNNs. The architecture automatically adapts itself to different network sizes given available hardware resources. As proof of concept, we show results obtained for different bit widths.Comment: Presented at the 26th International Conference on Artificial Neural Networks (ICANN) 2017 in Alghero, Ital

    Covering conditions and algorithms for the synthesis of speed-independent circuits

    Get PDF
    Journal ArticleAbstract-This paper presents theory and algorithms for the synthesis of standard C-implementations of speed-independent circuits. These implementations are block-level circuits which may consist of atomic gates to perform complex functions in order to ensure hazard freedom. First, we present Boolean covering conditions that guarantee that the standard C-implementations operate correctly. Then, we present two algorithms that produce optimal solutions to the covering problem. The first algorithm is always applicable, but does not complete on large circuits. The second algorithm, motivated by our observation that our covering problem can often be solved with a single cube, finds the optimal single-cube solution when such a solution exists. When applicable, the second algorithm is dramatically more efficient than the first, more general algorithm. We present results for benchmark specifications which indicate that our single-cube algorithm is applicable on most benchmark circuits and reduces run times by over an order of magnitude. The block-level circuits generated by our algorithms are a good starting point for tools that perform technology mapping to obtain gate-level speed independent circuits

    Technology mapping of timed circuits

    Get PDF
    Journal ArticleAbstract This paper presents an automated procedure for the technology mapping of timed circuits to practical gate libraries. Timed circuits are a class of asynchronous circuits that incorporate explicit timing information in the specification which is used throughout the design process to optimize the implementation. Our procedure begins with a timed specification and a delay-annotated gate library description which must include 2-input AND gates, OR gates, and C-elements, but optionally can include higher-fanin gates, AND-OR-INVERT blocks, and generalized C-elements. Our procedure first generates a technology-independent timed circuit netlist composed of possibly high-fanin AND gates, OR gates, and 2-input Celements. The procedure then investigates simultaneous decompositions of all high-fanin gates by adding state variables to the the specfication and performing resyn-thesis. Although multiple decompositions are explored, timing information is utilized to significantly reduce their number. Once all gates are sufficiently decomposed, the netlist can be mapped to the given gate library, taking advantage of any compact complex gates available. The decomposition and resyn-thesis steps have been fully automated within the synthesis tool ATACS and we present results for several examples

    Morse Code Datasets for Machine Learning

    Full text link
    We present an algorithm to generate synthetic datasets of tunable difficulty on classification of Morse code symbols for supervised machine learning problems, in particular, neural networks. The datasets are spatially one-dimensional and have a small number of input features, leading to high density of input information content. This makes them particularly challenging when implementing network complexity reduction methods. We explore how network performance is affected by deliberately adding various forms of noise and expanding the feature set and dataset size. Finally, we establish several metrics to indicate the difficulty of a dataset, and evaluate their merits. The algorithm and datasets are open-source.Comment: Presented at the 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT
    • …
    corecore